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(54) Audliio video encodling sysitenni wd^Di eimlhianced Ifuoicitiioinialliity 

(57) A system includes additional information (18) 
together with a video stream, where the additional infor- 
mation (1 8) is related to at least one of the frames (16). 
Preferably the additional information (18) is related to an 
object (17a, 17b) within the frame (16). A receiver (82) 
receives the video and additional information (18) and 
decodes the video in the same manner independently of 
whether the additional information (18) is provided. The 
additional information (18) is selectively presented to a 
viewer (238) at approximately the time of receiving the 
frames (16). The system may also present information 
to a viewer (238) from a unitary file (232,332) containing 
an image and additional information (18) associated 
with the image. A selection mechanism permits the 
selection of objects (17a, 17b) in the image for which 
the additional information (1 8) is related thereto. A pres- 
entation mechanism provides the additional information 
(18) to a viewer (238) in response to selecting the object 
(17a, 17b). 



FIG.1 



22 

-f- 



OBJEC T Pgfflg 

VCHCE ANNOTATTQN 
IMAiCaE FEAIUBES 
OBJECT UNSS 
URLUNKS 
lAVAi^FFLETS 



24 

±z 



VOICE ANNOTAHON 
IMAGBFEATUHES 

URLUNKS 
JAVAAmSTS 



FRAME INDEX 






16 






16 


1§ 






O o 












o 






f ^ 









170 17 



Printed b/ Xerox {UK) Business Services 
2.16.7/3.6 



gSDOCID; <EP 0982947A2J_> 



Description 



BACKGROUND OF THE INVENTION 

5 [0C01] The present invention relates to an improved audio, video, and/or image system with enhanced functionality. 
[0002] In the current infoimation age viewers are bombarded by vast amounts of video information being presented 
to them. The vtd^ informaticn may be presented to the viewer using many devices, such as for example, broadcast 
television. cabJe television, satellite broadcasts, streaming video on computer networks such as the World Wide Web, 
and video from storage devices such as compact discs, digital video discs, laser discs, and hard drives. People gener- 

10 ally view video coment in a passive manner with the interaction limited to interactivity typically found on a VCR. Depend- 
ing on the source of the video and the viewing device, the viewer may have the ability to fast fomvard, fast reverse, stop, 
pause, and mute the video. Unfortunately, it is difficult for the viewer to locate specific information within a video or sum- 
marize a video viftfftout the time consuming task of viewing large portions of the video. 

[0003] Existing dcgital lilies may incorporate techniques that attempt to process the video to create a summary of 
15 its content. Ho«f6v6i. the existing digital library techniques process selected frames as a whole in order to characterize 
the content of the vttJ»o Foi eaample, color histograms of selected frames may be used to describe the content of the 
frames. The resuming co5ct hastc^rams may be further summarized to provide a global measure of the entire video. The 
resulting informatjon t% Qsacctatcd with the respective video as a description thereof. Unfortunately, tt is difficult to iden- 
tify and characterize dbio&i cs^ftn «he image, such as Jeff playing with a blue beach ball on the beach. 

20 

BRIEF SUMMARY 03^ IWVEMTCOW 

[0C04] The present tnvorocn o»CToom<^ the aforementioned drawbacks of the prior art by providing in a first aspect 
a system that indade« EilrS??cortd crvtornrration together with a video stream, where the additional information is related 
25 to at least one of the frams^ P*rc<isT«:My ©le additional information is related to an object within the frame. A receiver 
receives the video and firairftcnQi tr^CoTTOitton and decodes the video in the same manner independently of whether the 
additional information t& prov^tSGd Ihe additional information is selectively presented to a viewer at approximately the 
time of receiving the framss 

[0005] In another asp«ci ^ present invention a system for presenting information includes a unitary file containing 
30 an image and additional tnCojfVHitKin associated with the image. A selection mechanism permits the selection of objects 
in the image for which ^ oaoiicGTtal tntormation is related thereto. A presentation mechanism provides the additional 
information to a viewer cn igs^ot^ ^ selecting the object. 

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS 

35 

[0006] 

FIG. 1 1s a depiction o9 ^ vttS$o and a descriptive stream together with data stored therein. 
FIG. 2 is a video imag® acsy osscoated information in accordance with FIG. 1 . 
40 FIG. 3 is a system tor ^ tAd«o and descriptive stream of FIG. 1 . 

FIG. 4 Is a system tor aeafeng end using an image with associated Information. 
FIG. 5 is an image t;mh aseooatcd information. 

FIG. 6 illustrates the mov<sm3Jit of an image and associated information from one image to another image. 
FIG. 7 Is an image file tojmat Cm the system of FIG. 4. 
45 FIG. 8 Illustrates an anerna&v© cmage file structure. 

FIG. 9 illustrates an imago cropp:ng information. 

FIG. 10 illustrates a JFIF(«) a®sA& and viewer. 

FIG. 1 1 illustrates vie^ng a JFIF(^) image on a legacy viewer. 

so DETAILED DESCRIPTION Of THE PREFERRED EMBODIMENT 

[0007] The present inventors canr>€ to the realization that the presently accepted passive viewing technique for video 
may be enhanced by irtcorpciattrtg additional Information together with the video stream. The additional information 
may include for example, a desa[p«rcn of the content of portions of the video, links within the video to Infamation apart 
55 from the video itself, links with:n Che vtdso to other portions of the video, software for computer programs, commands 
for other related interactivity. o2)}ect oTdCKes. textual descriptions, voice annotations, image features, object links, URL 
links, and Java applets. Other tntormation may likewise be included as desired. However, incorporating the additional 
information within the video stream vsoM in most instances require a new specification to be developed. For example. 
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the MPEG and MPEG-2 standards do not provide for the inclusion of additional information therein other than what is 
specified in the standard. The result of modifying such a video encoding technique would result in each viewer desiring 
to view the modified video being required to obtain a specialized viewer, at additional expense. 
[0008] The present inventors came to the further realization that each video standard that includes the capability of 

5 incorporating additional information therein, the particular technique used to incorporate the additional information is 
dependant on the particular video standard. Unfortunately, if a set of information is developed that relates to a particular 
video, then for each video standard a different technique is necessary to incorporate the additional information with the 
video. With the large number of different video standards available it would be burdensome to develop techniques for 
incorporating the additional information with each video standard. 

10 [0009] In view of the large number of video standards and the difficulty of incorporating such additional information 
therein the present inventors came to the further realization that a generally format independent technique of referenc- 
ing the additional information is desirable. In addition, a generally format independent format is more easily repurposed 
for different types of video formats. Referring to FIG. 1 , a description stream 1 2 containing the additional information is 
created as a companion for a video sequence 14. The video sequence 14 is composed of a plurality of sequential 

75 frames 16. The video may have any suitable format, such as for example analog or digital, interlaced or progressive, 
and encoded or not encoded. Each frame 16 may include one or more objects of interest 17a and 17b. Portions of the 
description stream 12 may be associated with any number of frames of the video sequence 14, such as a single frame, 
a group of sequential frames, a group of non-sequential frames, or the entire video sequence 14, as desired. In the 
event that a portion of the descriptive stream 12 is associated with a sequential number of frames, that portion of the 

20 descriptive stream may be thought of as having a "lifespan." 

[0010] The descriptive stream contains additional information about objects, such as 17a and 17b, appearing within 
one or more of the video frames 16. The descriptive stream 12 includes data blocks 18 where each block is associated 
with one or more frames 16, and preferably particular objects 17a, 17b within one or more frames 16. Atternatively. the 
data blocks 18 may be associated with frames 16 as a whole. Each data block 18 preferably includes a frame index 20 

25 at the beginning of the data block to provide convenient synchronization with the associated frame 16. The frame index 
20 includes data which identifies the particular frame the following data block is associated with. If the descriptive 
stream 12 and the video sequence 14 are sufficiently correlated in some manner, such as in time, then the frame index 
20 may be unnecessary. In the case of broadcast video, preferably the video sequence 14 and the description stream 
1 2 are time correlated. In the case of computer or digital based broadcasts, the video sequence 1 4 and the descriptive 

30 Stream 12 may be transmitted at different time intervals. For example, a large portion of the descriptive stream 12 may 
be transmitted, and then the associated video sequence 1 4 may be transmitted. 

[001 1 ] The frames indexes 20 are used to synchronize, or othenwise associate, the data blocks 1 8 of the descriptive 
stream 12 with the video sequence 14. Each data block 18 may be further divided into a number of sub-blocks 22. 24, 
containing what are refen-ed to herein as descriptors. Each sub-block 22, 24 con^esponds to an individual object of inter- 

35 est within the frame 16. For example, sub-block 22 may correspond to object 17a and sub-block 24 may correspond to 
object 17b. Alternatively, each of the sub-blocks may correspond to multiple objects of interest. Also, there may be 
objects in the image that are not defined as objects of interest, and which therefore, would not have a sub-block asso- 
ciated therewith. Sub-blocks 22, 24 include a plurality of data fields therein containing the additional information, includ- 
ing but not limited to, an object index field 30. a textual description field 32, a voice annotation field 34, an image feature 

40 field 36. an object links field 38, a URL links field 40, and a Java applets field 42. Additional information may be included 
such as copyright and other intellectual property rights. Some notices, such as copyrights, may be encoded and ren- 
dered invisible to standard display equipment so that the notices are not easily modified. 

[0012] When a viewer is viewing the video sequence 14. a visible or audible indicia is preferably presented to the 
viewer to indicate that a descriptive stream is associated with a particular sequence of video frames. The viewer may 

45 access the additional information using any suitable interface. The additional information is preferably presented to the 
user using a picture-in-a-picture (PIP) box on the display while the video sequence 14 continues to be presented. The 
video sequence 14 may be stopped during access of the additional information, if desired. An alternative technique for 
presenting the additional information to the viewer is to provide the additional information on a display incorporated into 
unidirectional or bidirectional remote control unit of the display device or VCR. This allows access to the additional infor- 

50 mation at a location proximate the viewer. In the case of broadcast video, such as network television broadcasts, if the 
viewer does not take appropriate actions to reveal the associated information the descriptive stream "dies." and may 
not. unless stored in a buffer, be revived. In the case that the descriptive stream is part of a video tape, a video disc, or 
other suitable media, the viewer can "rewind" the video and access an earlier portion of the descriptive stream and dis- 
play the additional information. 

55 [0013] The object index field 30 indexes one or more individual objects 1 7a. 17b within the frame 16. In the case of 
indexing the frame as a whole, the object index field 30 indexes the frame. The object index field 30 preferably contains 
a geometrical definition of the object. When a viewer pauses or otherwise indicates a desire to view the additional infor- 
mation for a particular frame, the system process the object index fields 30 con^esponding to that frame, locates tfie cor- 
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responding objects 17a, 1 7b within the frame, and identifies the corresponding objects in some manner for the viewer 
such as highlighting them on the display or providing icons. The identified objects are those objects of the frame that 
have associated information related thereto. If the user selects an identified object, then the system provides the addi- 
tional information from the coresponding sub-block, preferably with a pop-up menu, to the viewer. 

5 [G01 4] The textual description field 32 preferably includes textual based i nformation related to the object. The textual 
description field 32 may be similar in nature to traditional closed captioning, but instead is related to particular objects 
within the frame. The textual description field 32 may be used as the basis of a keyword-based search for relevant video 
segntents. A content-based video search program may search through the textual description fields 32 of the descrip- 
tion stream 12 to cdentify relevant portions of the video sequence(s) 14. With the textual description fields 32 normally 

10 related to individual objects within the frames 16 of the video sequence 12, the content-based video search provides 
actual object-oriented search capability. 

[Cnn 5] The voice annotation field 34 preferably stores further audio based infomiation regarding the object (or frame), 
preferably tn natural speech. The voice annotation field 34 may include any audio information related to the associated 
cbject(s) (CI tfrarTt©(s)) 

15 [COl 6] The image features field 36 is preferably used to store further information about the characteristics of the object 
(OJ frame). &uch as in terms of its texture, shape, dominant color, motion model desaibing its motion with respect to a 
certain reference frame Image features based on objects within the frames of a video sequence may be particularly 
useful Cor content -bflSGd video image Indexing and retrieval for digital libraries. 

[001 7] The object fonks field 38 is preferably used to store links to other video objects or frames in the same or different 
20 video sequence or tmage. Object links may be useful for video summarization. arKi ol:^ect and/or event tracking. 

[(^81 Referring also to FIG. 2, the URL links field 40 preferably contains addresses and/or links to external Web 
pages and/or other ot^ecis related to the object that are accessible though an electronic link, such as a computer net- 
work. For an object of interest in the scene, such as person 46. the URL link 58 in a sub-block 50 may point to a person's 
homepage address 52 Any symbol, icon, or portion of the scene may be linked to an external data source, such as a 
25 Site t^ch contains the related information. Companies may also desire to link products 54 shown in the video 

sequ^Ke. through the URL 58 of a sub-block 56, to an external data source, such as their Web site 60. This provides 
the potential fic^ customers to learn more about particular products, increases advertising, and may increase sales of 
the products. The URL links field may also be used to automatically import data and other information from a data 
source external to the video sequence 14 and the description stream 12 for incorporation with the video sequence 14. 
30 In this nmnner, the vtdeo sequence 14 and the description stream 12 may be automatically updated with information 
from a source external to the video sequence 14 and the desaiption stream 12. The information may be used in any 
suitable manner, such as overlying on the display, added to the video sequence, or update the contents of the informa- 
tion fields. 

[001 9] The Java Applets f iekJ 42 is preferably used to store Java code to perform more advanced functions related to 
35 the respective object(s). For example, a Java applet may be embedded to enable online ordering for a product shown 
in the vtdea Also. Java code may be Included to Implement sophisticated similarity measures to empower advanced 
content-based video search in digital libraries. Aiternativety. any other programming language or coding technique may 
be used. 

[0020] tn the case of digital video, the cassettes used for recording in such systems may include a memory, such as 
40 solid state menx)ry. which serves as a storage location for additional information. The memory for many such devices 
is referred to as memory-in-cassette (MIC). Where the video sequence is stored on a digital video cassette, the descrip- 
tive stream may be stored in the MIC, or on the video tape. In general, the descriptive stream may be stored along with 
the video or image contents on the same media. The descriptive stream is maintained separate from the video or image 
contents so that the video or image decoder does not have to also decode the descriptive stream encoded within the 
45 video stream, which is undesirable as previously discussed. 

[0021] Referring to FIG. 3. a system 70 generally applicable for a television broadcast system is shown. The system 
70 includes a capture mechanism 72, which may be a video camera, a computer capable of generating a video signal, 
or any other mechanism that is capable of generating and/or providing a video signal. The video signal is provided to 
an encoder 74, which also receives appropriate companion signals for the various types of additional infornration 76 
50 from which will form the descriptive stream. The encoder 74 generates a combined video stream and descriptive stream 
signal 78. The combined signal 78 is transmitted by a transmitter 80. which may be a broadcast transmitter, a hard-wire 
system, or a combination thereof. The combined signal 78 is received by a receiver 82, which separates the two signals 
and decodes each of the signals for display on a video display 84. 

[0022] A trigger mechanism 86 is provided to-cause the receiver 82 to decode and display the additional information 
55 contained within the descriptive stream in an appropriate manner. A decoder may be provided with the receiver 72 for 
decoding the embedded descriptive stream. The descriptive stream may be displayed in any suitable location or format 
such as a picture-in-picture (PIP) format on the video display 84. or a separate descriptive stream display 88. The sep- 
arate desaiptive stream display may be co-located with the trigger mechanism 86. which may take the form of a renrtote 
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control mechanism for the receiver. Some form of indicia may be provided, such as a visible indicia on the video display 
or as an audible tone, to indicate that a descriptive stream is present in the video sequence. 

[0023] Activating the trigger mechanism 86 when a descriptive stream is present will result in those objects which 
have descriptive streams associated therewith being highlighted, or othenfvise marked, so that the user may select addi- 

5 tional information about the object(s). In the case of a separate descriptive screen display, the selection options for the 
information is displayed in the descriptive stream display, and the device is manipulated to permit the user to select the 
additional information. The information may be displayed immediately, or may be stored for future reference. Of partic- 
ular importance for this embodiment is to allow the video display to continue uninten-upted so that others watching the 
display will not be compelled to remove the remote control from the possession of the user who is seeking additional 

10 irtformation. 

[0024] In the event that the system is used with an audio and/or video library on a computer system, the capture 
mechanism, transmitter, and receiver may not be required, as the video or image will have already been captured and 
stored in a library The library typically resides on nr>agnetic or optical media which is hard-wired to the display In this 
embodiment, a decoder to decode the descriptive stream may be located in the computer system or in the display The 
75 trigger mechanism may include several other selection devices, such as a mouse or other pointing device, and incor- 
porated into a keytx)ard with dedicated keys or by the assignment of a key sequence. The descriptive stream display 
will likely take the form of a window on the video display or a display on a remote. 

[0025] Television stations may utilize the teachings described herein to increase the functionality of broadcasting pro- 
grams. Television stations may transmit descriptive streams together with regular television signals so that viewers may 

20 receive both the television signals and the description streams to provide the advanced functions described herein. The 
technique for broadcast TV is similar to that of sending out closed caption text along with regular TV signals. Broadcast- 
ers have the flexibility of choosing to send or not to send the descriptive streams for their programs. If a receiving TV 
set has the capability of receiving and decoding the descriptive streams, then the viewer may activate the advanced 
functions, as desired, in a manner similar to the viewer selecting or activating, as desired, to view closed captioned text. 

25 If the viewer activates the advanced functions, the viewer, for example, may read text about someone or something in 
the programs, listen to voice annotations, access related Web site(s) if the TV set is Web enabled, or perform other 
tasks such as online ordering or gaming by executing embedded Java applets. 

[0026] The descriptive stream for a video sequence may be obtained using a variety of mechanisms. The descriptive 
stream may be constructed manually using an interactive method. An operator may explicitly select to index certain 
30 objects in the video and associate some corresponding additional information. Another example is that the descriptive 
stream may be constructed automatically using any video analysis tools, especially those developed for the Moving Pic- 
tures Experts Group Standard No. 7 (MPEG-7). 

[0027] Camcorders. VCRs. and DVD recorders, and other electronic devices may be used to create and store descrip- 
tive streams while recording and editing. Such devices may include a user interface to allow a user to manually locate 

35 and identify desired objects in the video, index the objects, and record corresponding information in the descriptive 
stream(s). For example, a user may locate an object within a frame by specifying a rectangular region (or polygonal 
region) which contains the object. The user may then enter text in the textual description field, record speech into the 
voice annotation field, and associate Web page addresses into the URL links field. The user may associate the addi- 
tional information with additional objects in the same frame, additional objects in other frames, and other frames, as 

40 desired. The descriptions for selected objects may also be used as their audio and/or visual tags. 

[0028] If a descriptive stream is recorded along with a video sequence, as described above, the video can be viewed 
later and support all the functions. 

[0029] For digital libraries, the system may be applied to video sequences or images originally stored in any common 
format, such as RGB, D1 , MPEG, MPEG-2, or MPEG-4. If a video sequence is stored in MPEG-4 format, the location 

45 information of the objects in the video may be extracted automatically. This alleviates the burden of manually locating 
the objects. Further, information may be associated with each extracted object within a frame arKi propagated into other 
sequential or nonsequential frames, if so selected. When a video sequence or image is stored in a non-object-based 
format, the mechanism described herein may be used to construct descriptive streams. This enables a video sequence 
or image stored in one format to be viewed and manipulated in a different format, and to have the description and linking 

50 features of the invention to be applied thereto. 

[0030] The descriptive streams facilitate content-based video/image indexing and retrieval. A search engine may find 
relevant video contents at the object level, by matching relevant keywords against the text stored in the textual descrip- 
tion fields in the descriptive streams. The search engine may also choose to analyze the voice annotations, match the 
image features, and/or look up the linked Web pages for additional information. The embedded Java applets may imple- 

55 ment more sophisticated similarity measures to further enhance content-based video/image indexing and retrieval. 
[0031] Images are traditionally self contained in a single file and displayed, as desired. For exannple, HTML files are 
frequently employed for Internet based applications that contains textual data and links to separate image files. For a 
single HTML based page of content, a HTML file and several separate image files may be necessary. When transferring 
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HTML based content to a different conputer system the associated image files (and other files) must also be located 
and transferred. Locating and transferring many files for a single HTML page is burdensome and may require knowl- 
edge of all the potential image files that may be loaded by the HTML page. Unfortunately, sometimes all the associated 
files are not transferred resulting in HTML based content that is not fully functional. 

5 [0032] Many Web page developers devote substantial efforts to the creation of images and associated content, such 
as advertising, for a professional Web page. The images are frequently copied by unscrupulous Web page developers, 
without a care as to Copyright violations, and reused for different uses. The associated content is discarded and the 
original Web page developer receives no compensation for the unauthorized use of his/her original image 
[0033] Digital camera systems exist that permit the user to annotate the content of the image file with textual informa- 

10 tion. Unfortunately, the textual information is overwritten directly on the image file thereby altering the image file itself. 
This permits recording of associated information with the image file but a portion of the original image content is in-e- 
versibly damaged which is unacceptable to many users. In addition, with the advent of digital cameras many users are 
discovering that tracking the content of digital images is becoming an increasingly difficult task. Typically the user cre- 
ates additional files with information that desaibes the content of the digital Image files. Unfortunately, when the addi- 

15 tional files are lost the information is lost. Also, if the digital image files are misplaced, then the content in the additional 
file has little or no value. 

[0034] One example of a file format that has been developed by a standardization organization that permits global 
information to be attached to images is Still Picture Interchange File Format (SPIFF), specified as an extension to the 
JPEG standard, ISO/IEC IS 10918-3 (Annex F). The specification was developed to permit textual information to be 
20 attached to files to facilitate searching of the files. In addition, if the textual information is voluminous then significant 
bandwidth may be required for transmission across a network and additional storage capability may be needed to store 
such files. The present inventors came to the realization that the textual information does not provide sinple and accu- 
rate representations of objects within the image itself. 

[0035] In view of the enhanced audio, visual, and textual experience made possible with the described invention with 
25 regard to video content, the present inventors came to the further realization that the concepts embodied in the present 
invention may be extended to images. In contrast to the traditional multiple file system where one file contains the tex- 
tual content and the other file contains the image, or the SPIFF file format, the present inventors came to the realization 
that additional information that enhances the image viewing experience may be included together with the image file In 
a unitary file. The additional information may include audio, video, conputer programs, and textual information associ- 
30 ated with the image or objects within the image such as descriptions and locations of the objects thereof. In addition, 
the additional information may be used to manage the images themselves. For example, the additional information may 
include, for example, descriptors, histograms, and indexing information that describe the content of the image itself. 
With the inclusion of the additional information together with the image file itself, the additional information is not sus- 
ceptible to becoming lost, misplaced, and deleted. Also, the image files may be managed based on the tiles themselves 
35 as opposed to a separate data file containing information regarding their content. This permits the users to select any 
set of image files upon which to perform searches without the necessity of having previously obtained descriptions of 
their content. 

[0036] However, the present inventors came the realization that it is desirable to maintain conpatibility with existing 
image presentation devices and software, such as Photoshop and Web based browsers, while permitting the enhanced 

40 functionality with modified image presentation software. To accomplish these objectives the file includes at least two 
layers in addition to the image itself. The image file itself remains unchanged, or substantially unchanged. The first and 
second layers are appended to the end of the image file and contain the additional information. In this mariner existing 
image presentation devices and software may simply display the image file and discard the remaining information, while 
enhanced presentation devices and software may also use the additional appended information. 

45 [0037] Referring to FIG. 4, the prefen-ed image system 1 00 includes an image 1 1 2 that is acquired or otherwise gen- 
erated. The image may be acquired from any suitable source, such as, for example, an imaging device such as a cam- 
era, generated by a computer, or may be an existing image. After acquiring or othenwise selecting the image 1 12, an 
object selection 1 14 function may be performed interactively with the user to define regions of the image that enclose 
objects of interest. The regions may define any shape or region, such as a circle, ellipse, rectangle, or regular polygon. 

50 The regions may be drawn on a display using any input device, such as a pen stylus. A pen stylus is particularly useful 
for images obtained by a camera or presented by a computer. Alternatively, object selection of the image may be per- 
formed on a computer using image analysis software Textual based and URL link based additional information related 
to particular objects within an image may be added by a user using an input device, such as a pen or keyboard. Audio 
annotation related to the image or objects within the image may be obtained in any suitable manner. For example, a 

55 microphone integrated or othenwise connected to the camera may allow annotation during the acquisition process. In 
addition, speech recognition software in the camera may be used to convert audio information to textual information 
using speech-to-text conversion. The speech-to-text functionality provides a convenient technique of adding textual 
information especially suitable for cameras that do not provide a convenient interface for entering textual based infor- 



JSDOCID: <EP 0982947A2_I_> 



EP 0 982 947 A2 



mation. A compression module 115 indudes an audio compression mechanism 1 1 3a and a data compression mecha- 
nism 1 13b. Compression of the audio annotation using a standard audio compression technique and data compression 
may be provided using a standard data compression technique, if desired. Suitable audio compression may include. 
Delta Pulse Coded Modulation (DPCM), while data compression may include Lempel-Zev-Welch (LZW). 

5 [0038] A generation of hierarchical data structure module 1 16 arranges the additional information into at least two 
layers, with the first layer referred to as the 'base layer", described later. An integration module 1 1 7 combines the con- 
tent related data containing the additional information together with the image 112, compressed by a conrpression mod- 
ule 170 if desired, into a single common file. The combination of the additional information and the image file may be 
supported as a native part of a future image file format, such as for example, that which may be adopted by JPEG2000 

10 or MPEG-4, Also, currently existing file formats may be extended to support the additional information. The combined 
file is constructs in such as manner that the extension of existing file formats provides backward compatOsility in the 
sense that a legacy image file viewer using an existing file format may still at least decode and read the image in the 
same manner as if the additional information were not included therein. An implementation with separate image and 
information files is also within the scope of the present invention. The integrated image and additional information file is 

15 then transmitted or stored at module 1 18. such as a channel, a server, or over a network. 

[0039] Storage may be in an type of memory device, such as a memory in an electronic camera or in a computer. The 
combined file containing the image and additional information may be transmitted as a single file via Email or as an 
attachment to an Email. If the audio and/or other associated data is compressed, decompression 122 of the audio 
and/or data as performed prior to audiovisual realization of the object information 124. Once images and the hierarchi- 

20 cal data structure associated with them are available to users, they may be utilized in an interactive manner. 

[0040] An interactive system utilizing the combined file may include the following steps to implement the retrieval and 
audiovisual realization of the object information 124 of the combined image file: 

(a) retrieve and display the image data: 

25 (b) riead the base layer information: 

(c) using the base layer information as an overlay generation mechanism, generate an overlay to visually indicate 
the regions of the image that contain additional information in terms of "hot spots." according to the region informa- 
tion contained in the base layer Hot spots may be automatically highlighted or be highlighted only when a user 
selects a location within the region defined by the "hot spot." such as with a pointing device: 

30 (d) display a pop-up menu adjacent, or othenwise on the display, of the object as the user points and selects the hot 
spots, where the types of available information for that object are featured in the menus; arej 
(e) render the additional information selected by the user when the user selects the appropriate entry in the menu. . 

[0041 ] It is preferable that the hot spots and pop-up menus (or other presentation techniques) are invoked in response 
35 to a user's request. In this manner, the additional information provided is not intrusive, but instead supplements the 
image viewing experience. Steps (a)-(e) are inplemented by the audiovisual realization of the object information mod- 
ule 124 which preferably contains appropriate computer software. 

[0042] Content-based image retrieval and editing may also be supported. A search engine 128 permits the user to 
locate specific images based on the additional information contained within the image file. Editing is provided by an 

40 object-based image manipulation and editing subsystem 126. Images 112 may be contained in a database which con- 
tains a collection of digital images. Such an image database may also be referred to as a library or a digital library 
[0043] Content-based information retrieval provides users with ackiitional options to utilize and Interact with the 
images in a dynamic nature. Rrst the user may select one or more regions or objects of interest in an image to retrieve 
further information. Such information may include for example, links to related Web sites or other multimedia material, 

45 textual descriptions, voice annotations, etc. Second, the user may look for certain images in a database via search 
engines. In database applications, images may be indexed and retrieved on the basis of associated information describ- 
ing their content. Such content-based information may be associated with images and objects within images and sub- 
sequently used in information retrieval. 

[0044] Object-based image editing enables users to manipulate images in terms of the objects contained within the 
so images. For example, the user may "drag*" a human subject in a picture, "drop" it to a different background image, and 
therefore compose a new image with certain desired effects. The current invention allows access to an outline (contour) 
information of objects to enable cutting and dragging objects from one image to another where they may be seamlessly 
integrated with a different background. The object-based additional information related to the object is maintained with 
the object itself as it is moved or otherwise manipulated. Accordingly the user need only define the outline of an object 
55 once and that outline is maintained together with the object. Preferably the outline is a rough geometric outline that is 
defined in the first layer, and a more detailed outline of the object is defined in the second layer (likely containing more 
bytes). This two-layer structure permits more efficient transmission of images, because the more precise outline is not 
always necessary and is therefore only transmitted to the user upon request. Together, content-based information 
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retrieval and object-based image editing offers a user new and exciting experience in viewing and manipuiating images. 
[00451 in the preferred implementation of the hierarchical data structure the "base layer" includes only content-related 
information and has a limited number of bytes. The actual content-related information is contained in the ^'second layer." 
The hierarchical implementation ensures that the downloading efficiency of compressed images is practically intact 

5 even after introducing the additional functionalities, while those functionalities may be fully realized when a user desires. 
[0046] Two principal objects accomplished when implementing the content-based information retrieval and object- 
based image editing are; (1 ) an image file that supports such functionalities should be downloadable or otherwise trans- 
ferrable aaoss a computer system in essentially the same time and stored using essentially the same storage space 
as if the additional information is not included; and (2) such functionalities may be fully realized when a user or applica- 

10 tion program desires. 

[0047J To accomplish the two principal objects the present inventors came to the realization that a multi-layer data 
structure is desired, such as two layers. The first layer, referred to herein as the "base layer", contains a limited number 
of bytes, such as up to a fixed number. The bytes of the first layer are principally used to specify a number of regions of 
interest and store a number of flags which indicate whether certain additional content-related information is available 
15 for a particular region. The second layer (and additional layers) includes the actual content-related information. In a net- 
working application, initially only the image and the base layer of its associated content-related information are trans- 
mitted. Since the base layer contains only a limited number of bytes, its impact on the time necessary to transmitted the 
image is negligible. 

[0048] Referring to FIG. 5. after initial downloading of an image, a user may view the image 1 40, and may also decide 

20 to interact with the contents of the image. The interaction may include interacting with an object of interest, such as 
character one 1 42, character two 1 44, or an object, such as object 1 46. Alternatively, a region of the image may be con- 
sidered as an object of interest. The entire image may also be treated as an object of interest. The user may select 
objects of interest using any suitable technique, such as a pointing device. The system presents a pop-up menu 148, 
150 (or other presentation technique) which lists the available information related to the selected region or object, based 

25 on the flags stored in the first (base) layer. If the user selects an item from the menu, the system will then start down- 
loading the related information stored in the second layer from the original source and provide the additional information 
to the user. The user may also choose to save a compressed image with or without its content-related infbmiation. 
When the user chooses to save the image with its content-related information, the flags corresponding to the available 
information in the first layer will be set to true, and vice versa. 

30 [0049] An initial set of content-related information, which may be of common interest, includes: (1) links to computer 
based information; (2) meta textual information; (3) voice annotation; and (4) object boundary information. Additionally, 
(5) security-copyright information; and (6) references to MPEG-7 descriptors, as described in "MPEG-7: Context and 
Objectives (Version 4)," ISO/EC JTC1/SC2S/WG11, Coding of Moving Pictures and Audio, N1733, July 1997, may be 
displayed. The syntax of Table 1 may be used to support the acquisition of content-related information. Other types of 

35 content-related information may be added to this initial set as necessary to satisfy particular needs. For example, com- 
puter code, for instance written in Java language, may be added to the list of associated information. In some cases, 
the system will open an already running application if the application is not already running. Such applications may take 
any form, such as a word processing application, a Java Applet, or any other application. 

40 

Table 1 



Base Layer Syntax 


Syntax 


Bits 


Mnemonic 


num_of_regions 


6 


uimsbf 


for (n=0; n<num_of_regions; n++){ 






region_start_x 


N 


uimsbf 


region_start_y 


N 


uimsbf 


region_width 


N 


uimsbf 


region_height 


N 


uimsbf 


linkjiag 


1 


bslbf 


metajlag 


1 


bslbf 


voicejiag 


1 


bslbf 
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Table 1 (continued) 



Base Layer Syntax 


Syntax 


Bits 


Mnemonic 


boundaryjiag 


1 


bslbf 


securityjiag 


1 


bslbl 


mpeg7_flag 


1 


bslbf 


} 







[0050] where N = cell(log2 (max{image_width, image_heighit))). 



Semantics 



20 



25 



30 



35 



40 



[G051] 

num__of_regions 

region_start_x 

region_start_y 

regionjwidth 

region_height 

link_flag 

mata_flag 

voice_flag 

boundaryjiag 

securityjiag 

mpegyjiag 



The number of regions in an image which may have additional content-related information. 
The X coordinate of the upper-left corner of a region. 
The y coordinate of the upper-left corner of a region. 
The width of a region. 
The height of a region. 

A 1 -bit flag which indicates the existence of links for a region. '1 ' indicates there are links attached to 
this region and '0' indicates none. 

A 1-bit flag which indicates the existence of meta information for a region. *1' indicates there is meta 
information and *0' indicates none. 

A 1 -bit flag which indicates the existence of voice annotation for a region. '1 ' indicates there is voice 
annotation and '0' Indicates none. 

A 1 -bit flag which indicates the existence of accurate boundary information for a region. '1 ' indicates 
there is boundary information and '0' indicates none. 

A 1-bit flag which indicates the existence of security-copyright information for a region. *1' indicates 
there is such information and X)' indicates none. 

A 1-bit flag which indicates the existence of references to MPEG-7 descriptors for a region. '1' indi- 
cates there is MPEG-7 reference information and '0' indicates none. 



[0052] The syntax for the first layer requires only a limited number of bytes. For example with 256 bytes the base layer 
may define at least 26 regions anywhere in an image whose size may be as large as 65.536 x 65,536 pixels. In contrast, 
to define 4 regions in any image, the base layer merely requires 38 bytes. 

[0053] The second layer contains the actual content-related information which, for each region, may include, for exam- 
ple, links, meta information, voice annotation, boundary information, security-copyright information, and MPEG-7 refer- 
ence information. Other descriptions related to the image to enhance the viewing or management thereof may be 
included, as desired. The high-level syntax of Table 2 may be used to store the above information in the second layer. 



Table 2 



Second Layer Syntax 


Syntax 


Bits 


Mnemonic 


for (n=0; n<num_of__regions; n++){ 






linksO 






meta() 






voiceO 






boundaryO 






securityO 






mpeg7() 
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Table 2 (continued) 



Second Layer Syntax 


Syntax 


Bits 


Mnemonic 


end_of_region 


16 


bslbl 


1 







[0054] The links and meta intbrmation are textual data and require lossless coding. The voice information may be 
10 coded using one of the existing sound compression techniques such as delta pulse coded modulation (DPCM), if 
desired. The boundary information may utilize the shape coding techniques developed in MPEG-4 "Description of Core 
Experiments on Shape Coding in MPEG 4 Video." ISO/IEC JTC1/SC29/WG1 1, Coding of Moving Pictures and Audio, 
N1584, March 1997. The security-copyright information may utilize any suitable encryption technique. MPEG-7 con- 
tains reference information to additional types of links. 
15 [0055] The precise syntax and format for each type of the above-identified content-related information may be deter- 
mined during the course of file format development for future standards, and are presented herein merely as examples 
of the system and technique of the present invention. In general, however, the syntax structure of Table 3 may be used. 



Tables 



Second Layer Syntax 


Syntax 


Bits 


Mnemonic 


type_of_info 


8 


bslbf 


length_of_data 


16 


uimsbf 


dataO 







30 Semantics 



[0056] 



35 



40 



45 



linksQ 

metaO 

voiceO 

boundaryO 

securityO 

mpeg70 

end__of_region 

type_ofJnfo 



length_of_data 
dataO 



The sub-syntax for coding links. 
The sub-syntax for coding meta information. 
The sub-syntax for coding voice annotation. 
The sub-syntax for coding boundary information. 
The sub-syntax for coding security-copyright information. 
The sub-syntax for coding MPEG-7 reference information. 
A 16-bit tag to signal tiie end of content-related infonnation for a region. 
An 8-bit tag to uniquely define the type of content-related information. The value of this parameter 
may be one of a set of numbers defined in a table which lists all types of content-related information 
such as links, meta information, voice annotation, boundary information, security-copyright informa- 
tion, and MPEG-7 reference information. 

The number of bytes used for storing ttie content-related information. 

The actijal syntax to code the content-related information. This may be determined on the basis of 
application requirements, or in accordance to tiie specifications of a future file format that may sup- 
port the hierarchical data sti-ucture as one of its native features. 



so [0057] Associating additional information, such as voice annotations and URL links to regions and/or objects in an 
image allows a user to interact with an image in ways not previously obtainable. Referring again to FIG. 5. an example 
of an image presentation witti the enhanced functionality is presented. The application reads the image data as well as 
the base layer of information. The application then displays the image on the display and visually indicates the "hot 
spots'* via an overlay on tiie image, according to tiie region infornnation in the base layer. The user selects a region 

55 and/or object of interest. A pop-up menu 148 appears which lists items ttiat are available for the selected region and/or 
object (more tiian one nray be available). When the user selects ttie voice annotation item, for example, the application 
will tiien locate the audio Information in the second layer and play it back using a default sound player application 154. 
If tiie user selects a link which is a URL link 150 to a Web site 152, tiie system will then locate tiie address and display 
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" the corresponding Web page in a default Web browser. A link may also point to another image file or even point to 
another region arxl/or object in an image. Similarly, additional meta information may also be retrieved and viewed (in a 
variety of different formats) by the user by selecting the corresponding item in the menu. Using this technique, different 
regions and/ci obiects in the same image may have different additional information attached thereto. The user is able 
5 to hear different voices corresponding to different characters in the image, for Instance. Individual Web pages (or other 
associated inior mation obtained via a computer network) may also be attached directly to more relevant components in 
the scene. 

[0058] When editing images it is desirable to cut copy, and paste in terms of objects having arbitrary shapes. The 
proposed techncQue supports such functionality provided additional shape information is available in the file. Referring 

10 to FIG. 6. an esample wherry using the boundary information 160 associated with a baby object 162, a user may copy 
the baby ob}ect 162. artd place it into a different background 164, thus, moving one computer-generated image into 
another conoputer ger^^ated image. In addition, the attributes related to the baby object 162 are maintained, such as 
audio. The sequence of actions may happen in the following order The user first selects the baby object 162 and the 
system provcdes a pop-up menu 166. The user then selects the boundary item 168, which is generated by a tx)undary 

IS generation mechsini&m m ^ system. The system then loads the boundary information from level 2 and highlights the 
baby object, as iru&cat^sd by «he bright line about the object. The user may then cut and paste 1 70 (or otherwise relo- 
cate) or perform a <£i£tg find dicp type 1 72 of acton from the edit menu 1 70 (copy). 

[0059] By assocfcsttng d<?acrtp«ors to images, such as MPEG-7 descriptors, the images may be retrieved based on 
their audio and/oi v«&iaJ cements by advanced search engines. The descriptors may indude color, texture, shape, as 
20 well as keywords tn ge^s/d on cmage only needs to carry minimal reference information which points to other descrip- 
tion streams, such as an WPEG-7 description streams. 

[OOSO] An integrated sjriicjv* to stEpport the advanced functionality of content-based information retrieval and object- 
based image ©cfiting (\as bocn c£?6Ctos<sd. The technique employs a two-layer (or more) hierarchical data structure to 
store the corrtent-f c^^cas crtrtr n^cson The first layer includes coordinates which specify regions of interest in rectangular 
25 shape and flags c»^tc^^ dTrtscm© o^ts^hsr certain additional content-related information is available for the specified 
regions. The actual oortens^cJaifesd cnCormation is stored in the second layer where one may find, for example, links, 
meta information. auc£)p annotafeon. boundary information, security-copyright information, arxJ MPEG-7 reference infor- 
mation for each specfit^ ob?®cs md/ox region. 

[00S1 ] With the Ut&^^ hfsvwtg & bntited number of bytes, the downloading time necessary to obtain the file and stor- 
30 age necessary for ^ mag© md first layer is minimized, unless the user or application explicitly requests additional 
content-related informaw from ^ second (or additional layer). On the other hand, should the user r«?uire such Infor- 
mation, the proposed t«:Jwius s3so guarantees it may be fully delivered by the file itself containing tiie remaining infor- 
mation. 

[00S2] The existing JPEG ccmywessed image file formats, such as still picture interchange file format (SPIFF) or 
35 JPEG File Intercfwirtge Fam^a (JFIF), do not inherently support object-based information embedding and interactive 
retrieval of such informateon Afif^cugh creating, experiencing, and utilizing information enhanced images may be per- 
formed using the system o? ffKj curr^ invention, it may be also desirable that the information enhanced images created 
by the current inventjon msy at least decoded and displayed by legacy viewers using any standard format, such as 
JFIF or SPIFF, Indeed. t?*o i<&gsicy systems will not be able to recognize and utilize the associated information. The goal 
40 for this aspect of the py mwsmicn is therefore to guarantee successful image decoding and display by a legacy sys- 
tem without breakirtg (Soc^r ^ system. 

[00S3] If backward compa?iJidi?y tsfith legacy viewers, such as those that utilize JFIF and SPIFF file formats, is a neces- 
sity, the disclosed hi&estcfu^ data structure may be encapsulated into a JIFF or SPIFF file.fbrmat. Examples of such 
encapsulations that may bo orrpi^mented by module 1 17 In FIG. 4 are given below. 

45 [0064] JIFF file format ^ cssacnijsd in Graphics File Formats: Second Edition, by J. D. Murray and W. VanRyper. 
O'Reilly & Associates . 1996, pp 5tO-515. Referring now to FIG. 7, a JFIF file structure 190 contains JPEG data 
196 and an End Of Imago (EOl) rrarfeer 194. A JFIF viewer simply ignores any data that follows the EOl marker 194. 
Hence, if the 2-layer hjcfarc^tcml data structure 192 disclosed herein is appended to a JFIF file immediately after EOl 
194, the legacy viewers b© eJjJe to decode and display the image, ignoring the additional data structure. A system 

50 constructed according to th© pj©senJ invention may appropriately interpret the additional data and implement the inter- 
active functionalities of the trwc^jon. 

[00S5] Using SPIFF, the rtj^^^archical data structure may be encapsulated using a private tag, known to the system of 
the present invention Since a legacy viewer will ignore non-standard tags and associated information fields, according 
to the SPIFF specificatton. cmages may be successfully decoded and displayed by SPIFF-compliartt legacy systems. 
55 The system of the present irwemon recognizes and appropriately utilizes the added data to enable its interactive func- 
tionalities. SPIFF is desai5>Bd m Craphtcs File Formats: Second Edition, by J. D. Murray and W. VanRyper, O'Reilly & 
Associates Inc., 1996, pp 822-837 ) 

[00S6] The method may be applied to any existing computing environment. If an image file is stored on a local disk. 
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the proposed functionalities may be realized by a stand-alone image viewer or any application which supports such 
functionalities, without any additional system changes. If the image file is stored remotely on a server, the proposed 
functionalities may still be realized by any application which support such functionalities on the client side, including an 
image parser module on the server. The server includes an image parser because the additional content-related infor- 
mation resides in the same file as the image itself. When a user requests certain content-related information regarding 
a selected region and/or object in an image, e.g., its meta intormation. it is important that the system fetches only the 
relevant information and presents it to the user, preferably as fast as possible. To achieve this objective, the server 
parses the image file, locates, and transmits relevant content-related information to the client. 
(0CS7] To implement the aforementioned additional functionality without the enhancement of the present invention, 
each piece of content-related information is stored in a separate file, as shown in FIG. 8, generally at 180. Therefore, 
for each defined region, as many as six files which contain links, meta information, voice annotation, boundary informa- 
tion, security-copyright information, and MPEG-7 reference information may be required. For a given image, say 
myjmage.jpg, a directory called my imagelnfo which contains content-related information for N defined regions is cre- 
ated and stored in: 

regionOI .links 

regionOI .meta 

regionOI .voice 

regionOI .boundary 

regionOI .security 

regionOI .mpeg7 
***** 

regionON.Iinks 

regionON.meta 

regionON.voice 

regionON.boundary 

regionON.security 

regionON.mpeg? 

[0C681 Using separate files to store additional Information is fragile and messy in practice. A simple mis-match 
between the file names due to a name change would cause tiie complete loss of tiie content-related information. 
[0C69] The present invention has several advantages over ttie known prior art. such as. for example: (1) it is object- 
based and tiius flexible; (2) it allows for inclusion of object feature information, such as object shape boundary: (3) is 
has a hierarchical data structure and hence it does not burden tiiose applications that choose not to download and store 
image-content related information; (4) it allows audiovisual realization of object-based information, at users' request; (5) 
it allows for inclusion of URL links and hence provides an added dimensionality to enjoyment and utilization of digital 
images (The URL links may point to web pages related to the image content, such as personal web pages, product web 
pages, and web pages for certain cities, locations, etc.); and (6) it is generic and applicable to any image conrpression 
technique as well as to uncompressed images. The present invention also provides object-based functionalities to forth- 
coming compression standards, such as JPEG 2000. Altiiough prior file formats do not inherently support the system 
disclosed herein, techniques for Implementing tiie system in a backward compatible manner where legacy systems may 
at least decode the image data and ignore the added information has been disclosed. 

[0070] Data structures configured in the manner described in tiie present invention may be downloaded over a net- 
work in a selective fashion. The downloading application checks with the user interactively to determine whetiner the 
user desires to download and store the content information. If the user says "No." the application reti-ieves only the 
image data, tiie base layer, and sets tiie flags in the base layer to zero indicating that tfiere is no content information 
with the image. 

[0071] The metiiod and system also support scalable image compression/decompression algorithms. In quality-scal- 
able compression, images may be decoded at various different quality levels. In spatial scalable compression, tiie. 
image may be decoded at different spatial resolutions. In case of compression algorithms tiiat support scalability, only 
the region information and object contour needs to be scaled to support spatial scalability. All other types of data stay 
intact. 

[00721 JPEG compressed images are commonly formatted as a JPEG file Interchange format (JFIF). The present 
inventors further determined that JFIF may be extended resulting in a new file format where object based information 
embedding is enabled using tfie two-layer (or more) data structure. The resulting extended file format is referred to as 
JFIF(+). A preferred system for generating and viewing JFIF(+) files is depicted in FIG. 10. JFIF(+) is viewable with leg- 
acy JPEG/JFIF viewers. FIG. 1 1 depicts the backward compatibility of JFIF(+) with legacy JPEG viewers. 
[0073] The present inventors come to ttie realization that additional information types, such as JPL_FINISHINFO. are 
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^ useful for containing information and instructions to a photo finisher (including, for example, cropping, paper types and 
settings), especially useful, for example, for on-line ordering of prints. A particular example of this application is depicted 
in FIG. 9. JFIF(+) includes a provision for storing digital ink information, and information about user*s viewing patterns 
of images (e.g., frequency of viewing, etc.). The history allows the system to develop user preferences and a data base 
5 to provide appropriate images upon request. Also, this alleviates a "page zero" dilemma by being able to provide 
images from a data base without the viewer having viewed any of them by the user preferences. An application of 
JFIF(+) is enhanced image EMail where personalized audiovisual Information may be embedded for different objects in 
the picture and then played back by the receiver. 

[0074] JFIF(+) is an extension to the already established JFIF file format. JFIF(+) adds support for node based image 
10 outline objects and the linking of these objects to various other data types such as, URLs, sound files, executables, tex- 
tual descriptions and custom application defined data. This additional information may be used to create an interactive 
environment, offer advanced object based editing functions, and to retrieve information based on conterrt. 
[0075] The original JFIF format allows for only a limited number of application extensible markers, each of a limited 
size. The JFIF(+) information of the present invention is added to the end of the JFIF file. This file structure offers flexi- 
15 bitity and maintains compatibility with standard JFIF decoders. 

[0076] The additional information in the JFIF(+) format is divided into two layers (or more), a first layer (l_ayer 1 ), con- 
taining basic information necessary to render the JFIF(+) interface and, a second layer (Layer 2). containing the actual 
information linked to the objects in the image. By dividing the data into these two layers (or more) it is possible for low 
bandwidth devices to download only the small first layer and then. k>ased on user feedback, download the additional 
20 data that the user requests. When the server lacks the capability to provide such interaction, the entire file may be 
loaded. 



Table 4 
File Organization 
JFIF Data 
JFIF(+) First Layer 
JFIF(+) Second Layer 

[0077] The JFIF(+) information follows the EOl marker specified in the standard JFIF format. This requires a partial 
parsing of the original JFIF file in order to find the EOl marker. The first layer of the JFIF(+) information identifies the 
additional information as JFIF(+) data and contains a minimum of information atx)ut the defined objects. This infor ma- 
ss tion includes a rectangular region (or other definition) defining the object's position in the image and an identifier defin- 
ing the type of data contained in the object. 



Table 5 



First Layer 


Item 


Size 


Description 


identifier 


16 bits 


A unique value to identify a JFIF+ file. Always contains $D0.$07. 


version 


8 bits(uimbsO 


Version of this JFIF+ file. Contains 0.01 for this version of JFIF(+). 


length 


32 bits(uimbsf) 


The total length of the first layer information (including identifier). 


numOfObjects 


16bits(uimsbf) 


The number of objects in the JFIF(+) information. 


for(i=0 ;i <nu mOf Objects :i-M-){ 


numOfData 


16bits(uimsbf) 


Number of data items associated with this object. 


X 


16bits(uimsbf) 


X starting position of object's rectangular region (set to 0 for data 
items that are not associated with a specific region). 


y 


16bits(uimsbf) 


Y starting position of object's rectangular region (set to 0 for data 
items that are not associated with a specific region). 


width 


16 bits(uimsbf) 


Width of object's rectangular region (set to 0 for data items that are 
not associated with a specific region). 
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Table 5 (continued) 


r- • First Layer 


Item 


Size 


Description 


height 


16bits(uimsbf) 


Height of object's rectangular region (set to 0 for data items that 
are not associated with a specific region). 


ID 


NumOfDataM6 bits(uimsbf 


Array of type identifiers for the data objects associated with the 
region(Type information to follow). 


} 





70 



100781 TaWe 5 <n essence, defines the regions of the Image that may contain additional data. The Identifier field per- 
SLX^«.«SrmefileasaJFIFWfile.Tl,elengthfieW^ 

[MWl'^Cs^ l«y« 03 «he JFIFW structurecontalns thedataa^^^ 
in the order that they c?<sjq 



Tables 



1 Format of Second Layer 


Item 




Description 


length 




Total length of the second layer. 


offsetArray [n] 


bits(uinnbsf) 


Array of offsets from the end of the header to the start of each data 
item. 


data 




Start of object data. 



20 



25 



30 



Defined Data Types 


Type 1 Valw^ 


Description 


JPL_BOUNDARY 


1 


Detailed boundary information for the object(fbrmat follows). 


JPL_META 


2 


Meta tags as defined for HTML. Content creators may either add many indi- 
vidual META tags or add one set of text containing many META tags. 


JPL_A1FF^S0UND 


3 


AlFF format sound data. 


JPL_URL 




URL text. 


JPL_TEXT 




Text annotation(lt is recommended that text falling into one of the predefined 
META tag definitions be entered in a META field). 


JPL_HTML 




HTML page to be rendered within the obiect(tf the parser supports META 
tags, it should also look here for META infornBtion). 


JPLJAVA 


7 


A Java Applet(When including any executable, requirements information 
should be included in a JAVAREQ). 


JPLJAVAREQ 


8 


A null terminated test string containing information for the user concerning 
the executable's requirements. 


JPL_H1ST0GRAM 


9 


Color histogram information (format follows). 


JPL_ENVINFO 


10 


A data structure containing information about the conditions under which the 
inage was created. 



35 



40 



45 



SO 



55 
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Table 7 (continued) 



Defined Data Types 


TvnP 


VdiUc 






1 1 


M uoia siiUuiure coiuctiriiriy iniormaiiun lor a pnoio Tintsner to use in repro~ 
ducing the image. 


JPL_DATE 


12 


ISO C 26 Character Format null terminated string containing the date of cre- 
ation. 


JPL.EDfTDATE 


13 


ISO C 26 Character Format null terminated string containing last date edited. 


JPL_SPRrrE 


14 


A JFIF image to be drawn on top of the main image at the object's location. 


JPL_ AUTHOR 


15 


A null terminated string containing author information. 


JPL_COPYRtGHT 


16 


A null terminated string containing copyright information. 


JPL_PROTECTEO 


17 


A structure containing password protected encrypted data. 


JPL_INK 


18 


A digital ink structure to be drawn on top of the main image at the object's 
location. 


JPL_USEIPyFO 


20 


A structure containing information about how the image has to be viewed. 


JPL_RESERVED 


-1999 


Reserved for further extension. 


JPL.USER 


2000-65535 


For proprietary use by software vendors. 
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Table 8 



JPL_BOUNDARY Data Format 


tt&n 


Size 


Description 


NumOfVerticies 


16 bits(uimsbf) 


The total number of vertices in the boundary representation. 


X 


16 bits{uimsbf) 


X position of starting vertex. 


y 


16 bits(uim5bf) 


y position of starting vertex. 


for(i=0;i<numOfObjects;i++){ 


dx[n] 


8 bits(uimsbf) 


X offset from previous vertex. 


dyin] 


8 bits(uimsbf) 


y offset from previous vertex. 


} 
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Table 9 - JPL HISTOGKAM Format 



5 r 

! 


Item 


Size 


Description T 


i 


colorSpacelD 


8 bits (uimsbf ) 


the color space 


70 






identification code e-STw 








RGB, HSV, etc. 




uSize 


8 bits (uimsbf) 


The number of bins along the 


15 






first color axis, e.g., R 




vSize 


8 bits (uimsbf) 


The number of bins along the 
first color axis, e.g., G 


20 


wSize 


8 bits (uimsbf) 


The number of bins along the 
first color axis, e.g., B 


25 


for{u-0;u<iiSize; u++) { 




for (v=«0;v<vSize; v++) { 




for(w=0;w<vjSize; ^++) { 


30 


count [u] [vl [w] 


8 bits (uimsbf) 


The total number of pixels in 
the image which are in 
color (u„v,t!/) 


35 


} 










40 
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Table 10 



JPL ENVINFO Format 


Item 


Size 


Description 


cameralD 


strlen+1 


A text string containing the camera s ID. 


flashMode 


8 bits{uimsbf) 


0-off , 1 -on. other values are camera specific. 


shutterSpeed 


32 bits{uimsbf) 


Shutter speed in nanoseconds. 


fStop 


8 bits(uimsbf) 


Fstop setting. 


indoor 


8 bits(uimsbf) 


0-lndoor, 1 -outdoor, other values are camera specific. 


focalLength 


16blts(uimsbf) 


Focal length of tens in millimeters. 
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Table 1 1 



10 
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JPL_FINISHINFO Format 


Item 


Size 


Description 


paperSize 


8 blts(uimsbf) 


The paper size. 


paperType 


8 bits(uimsbf) 


The paper type (glossy, matte, etc.)- 


printEffect 


8 bits(uimsbf) 


The print effect (oil paint, impressionist, etc.). 


cropX 


16bits(uimsbf) 


Crop and zoom x position. 


cropY 


16bits(uimsbf) 


Crop and zoom y position. 


cropW 


I6bit5(uimsbf) 


Crop and zoom width. 


cropH 


16bits(uimsbf) 


Crop and zoom height. 
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Table 12 



25 



30 



JPL_PROTECTED Format 


Item 


Size 


Description 


password Key 


strlen+1 


The encryption key for the data. 


ID 


16bits(uimsbf) 


The type identifier for the data object associated with the region. 


data 




Start of encrypted object data. 
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Table 13 



JPL_FINISHINFO Format 


Item 


Size 


Description 


times 


16bit5(uimsbf) 


The number of times an image has been viewed (no roll over). 


time 


32 bits(uimsbf) 


The number of seconds an image has been viewed (no roll over). 


width 


16bits(uimsbf) 


The width at which the image was viewed. 


height 


16bits(uimsbf) 


The height at which the image was viewed. 


date 


strlen+1 


ISO C 26 Character Format null terminated string containing the last date the photo 
was viewed. 


linkNext 


strlen+1 


Full path and name of the next image viewed. 


linkPrev 


strlen+1 


Full path and name of the previous image viewed. 



[01^0] It is noted that information other than the types of information discussed herein may be incorporated into a 
JFIF(+) framework. In addition, data formats for the types of information described herein may be expanded to include 
more details. A design similar to JFIF(+) may also be made for images that are compressed by techniques other than 
55 JPEG. 

[01^1] Referring now to FIG. 9. an image 210 illustrates a possible application of the disclosed image file format. This 
particular application is on-line ordering of a high-quality output print of a digital image. The proposed file format pro- 
vides additional flexibility in ordering prints on line. The user may specify a region 212, surrounded by dashed lines, to 
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be zoomed, cropped, and printed. Referring now to FIG. 10. the technique depicted generally at 220 includes a method 
for generating JFIF(+) files 222. and a method for viewing JFIF(+) files 224. Generating JFIF(+) files 222 starts with a 
JPEG file 226. Using an authoring tool 228. a user 230 draws a rectangular region 212 on image 210. and then inputs 
information that is stored in the JPL^FINISHINFO field in order to provide printing instructions to the photo finisher. The 

5 authoring application automatically reads the coordinate and size information of the region and places them in the 
JPL_FINISHINFO field. The user then transfers the resulting file 232. generated by a JFIF(+) file generator 234, to a 
service provider. The service provider uses a reader application 224, which contains a JFIF(+) parser 236. extracts the 
cropping and printing instructions, and executes the order. The result may be viewed in a JFIF(+) viewer 238, also 
referred to her©n as an enhanced JFIF interface. In this example, the first layer of the file contains the position informa- 

w tton fior the regon of interest and the second layer contains the region specific information. 

(DI^) An enhanced JFIF interface allows the user to identify the image objects that contain information and discover 
t^^e types of intormation using the basic information contained in the first layer Through the enhanced JFIF interface the 
user ^n access particular information, contained in layer 2. linked to a particular object. 

l(mZ] Alternatively, the JPL_FINISHINFO field may not be used. The user, for instance, may attach textual informa- 
15 tion to the specrf led region by invoking the J PL_TEXT. The textual information may state "zoom and crop this region and 
fmUe two prmts. one 4jt6 and one 5x7 both printed on matte paper." In yet another variation, the user may choose to 
express the Of6& description via voice input by invoking the sound field. 

[CttS^I FIG 1 1 d^»cts how a JFIF(+) file 332 may be input to a JPEG/JFIF legacy viewer 340. which will display the 
conventtonal porton of the image to user 330. The added features of the JFIF(+) file will not be available to the user of 
20 the legacy w&sm. but the basic image will still be usable. 

lOJSBSl The t^n% and ei^essions which have been employed in the foregoing specification are used therein as terms 
of description and not oJ limitation, and there is no intention, in the use of such terms and expressions, of excluding 
equivalents of the features shown and described or portions thereof, it being recognized that the scope of the invention 
is defined and limited only by the claims which follow. 

25 

Claims 

1 . A method of associating additional information (1 8) with a video including a plurality of frames (16) comprising: 

30 (a) identifying at least one of said frames (16); 

(b) providing a descriptive stream (12) separate from said video; 

(c) including said additional information (18) in said descriptive stream (12) related to said at least one of said 
frames (16); 

(d) providing said video for displaying on a display (84); and 

35 (e) selectiv^y providing said additional information (18) to a viewer (238) approximately the time of said pro- 

viding said video. 

2. The method of claim 1 characterized In that said additional information (18) includes at least one of an object 
index (30). a textual descnption (32), a voice annotation (34), an image feature (36). an object link (38), a URL link 

40 (40). and a Java applet (42). 

3. The method of claim l characterized In that said identifying is an object (17a. 1 7b) within said frame (16). 

4. The method of claim 1 where said descriptive stream (12) is related to a plurality of said frames (16). 

45 

5. The method of daim 4 characterized in that said at least one of said frames (16) are in sequential order in said 
video. 

6. The method of claim 4 characterized in that said at least one of said frames (1 6) are in nonsequential order in said 
50 video. 

7. A method of claim 3 characterized in that said additional information (18) is related to said object (17a. 17b). 

8. The method of claim 1 characterized in that said descriptive stream (12) includes an index synchronizing said 
55 video with said descriptive stream (1 2). 

9. The method of claim 1 characterized in that said descriptive stream (12) includes copyright information. 
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10. The method of claim 1 characterized in that sard descriptive stream (12) is encoded separately from said video. 

1 1 . The method of claim 1 0 characterized in that said video is decoded in the same manner independently of whether 
said descriptive stream (12) Is provided. 

5 

12. The method of claim 1 1 characterized in that said video is at least one of MPEG-2 and television broadcast for- 
mat. 

13. The method of claim 1 characterized in that said additional information (18) is presented to said viewer (238) on 
10 a remote control. 

14. The method of claim 1 characterized in that an audible signal indicates the availability of said additional informa- 
tion (18). 

75 15. The method of daim 1 characterized in that a visual signal indicates the availability of said additional information 
(18). 

16. The method of claim 7 characterized in that said additional information (18) includes textual based information 
related to said object (1 7a, 1 7b). 

20 

17. The method of claim 7 characterized In that said additional information (18) includes textual based Information 
related to said object (1 7a. 1 7b). 

18. The method of claim 7 characterized in that said additional information (18) includes image features (36) compris- 
es ing at least one of texture, shape, dominant color, and a motion model related to said object (1 7a, 1 7b). 

1 9. The method of claim 7 characterized in that said additional information (18) includes links to at least one of other 
objects (17a. 17b) and frames (16) within said video. 

30 20. The method of daim 7 characterized in that said additional information (18) indudes program instructions related 
to said object (1 7a, 1 7b). 

21 . A video system comprising: 

35 (a) an encoder (74) that includes additional information (18) within a video stream induding a video Including 

a plurality of frames (16). where said additional information (18) is related to at least one of said frames (16); 

(b) a receiver (82) that receives said video and said additional Information (18), and said receiver (82) decodes 
said video in the same manner independently of whether said additional information (18) is provided; 

(c) a display (84) for displaying said video; and 

40 (d) a trigger mechanism (86) for selectively presenting said additional information (18) to a viewer (238) at 

approximately the time of presenting said frames (16) to said viewer (238). 

22. The system of claim 21 , further comprising: 

45 (a) a transmitter (80) for transmitting said video signal and said additional information (18); and 

(b) a receiver (82) for receiving said video signal and said additional information (18). 

23. The system of claim 22 characterized in that said encoder (74) is at least one of a video camera and a computer. 

£0 24. The system of claim 21 characterized in that said trigger mechanism (86) is located in a remote control device. 

25. The system of claim 21 characterized In that said additional information (18) is provided by a remote control 
device. 

55 26. The method of claim 21 characterized In that said additional information (18) is related to an object (17a. 17b) 
within said frame (16) and includes links to at least one of other objects (17a. 17b) and frames (16) within said 
video. 
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27. The method of claim 21 characterized in that said additional information (18) is related to an object (17a, 17b) 
within said frame (16) and includes program instructions related to said object (17a. 17b). 

28. The method of claim 21 characterized in that said additional information (18) is related to an object (17a. 17b) 
5 within said frame (1 6) and includes textual based information related to said object (1 7a, 1 7b). 

29. The method of claim 21 characterized in that said additional information (18) is related to an object (17a. 17b) 
within said frame (16) and includes audible information related to said object (17a. 17b). 

10 30. The method of claim 21 characterized in that said additional information (18) is related to an object (17a. 17b) 
within said frame (16) and includes image features (36) comprising at least one of texture, shape, dominant color, 
and a motion model related to said object (1 7a, 1 7b). 



31. A system for presenting information comprising: 



15 



la) a unitary file (232. 332) containing an image and additional information (18) associated with said image; 

(b) a selection mechanism that permits the selection of objects (1 7a. 17b) in said image for which said addi- 
tional information (18) is related thereto; and 

(c) a presentation mechanism that provides said additional information (18) to a viewer (238) in response to 
20 selecting said object (1 7a, 1 7b). 

32. The system of claim 31 characterized in that said file (232, 332) includes said image followed by said additional 
information (18). 

25 33. The system of claim 32 characterized in that said image and said additional information (18) are separated by a 
marker (194) indicating the end of said image. 

34 The system of daim 33 characterized in that an image ^flewer (340) which does not recognize said additional 
information (18) will display said image properly and recognize said marker (194) as indicating the end of said 
30 image. 

35. The system of claim 34 characterized in that said image is in a JPEG format. 

36. The system of claim 31 characterized in that said additional information (18) is organized in at least two layers 
35 comprising: 

(a) a first layer containing information describing the location of objects (17a, 17b) within said image; and 

(b) a second layer containing additional information (18) regarding said objects (17a. 17b) within said image, 
where said first layer contains fewer bytes than said second layer. 



40 



37. The system of claim 36 characterized in that said second layer follows said first layer, which in turn follows said 
image file (232. 332). 

38. The system of claim 36 characterized in that said first layer contains a length identifier describing the length of 
45 said first layer. 

39. The system of claim 36 characterized in that said first layer contains a number of objects identifier describing the 
number of objects identified by said first layer. 

50 40. The system of claim 36 characterized in that said first layer contains a number of data identifier describing the 
number of data items associated with a particular said object (1 7a. 17b). 

41. The system of claim 36 characterized in that said first layer contains a first definition of the outline of an object 
(17a, 17b) of said image. 

42. The system of claim 36 characterized in that said second layer contains a length identifier describing the length 
of said second layer. 
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43. The system of claim 36 characterized in that said second layer contains an array of offsets that identify the start 
of each data item. 

44. The system of daim 41 characterized in that said second layer contains a second definition of the outline of said 
object (1 7a, 1 7b) of said image, where said second definition more closely approximates the outline of said object 
(17a, 17b) than said first definition. 

45. The system of claim 41 characterized in that said second layer contains a second definition of the outline of said 
object (1 7a, 1 7b) of said image, where said second definition contains more bytes than said first definition. 

46. The system of daim 36 characterized in that said second layer includes sound data related to said object (17a. 
17b). 

47. The system of daim 36 characterized on that said second layer includes HTML meta tags related to said object 
75 (17a. 17b). 

46. The system of claim 36 characterized in that said second layer includes textual annotations related to said object 
(17a, 17b). 

20 49. The system of claim 36 characterized in that said second layer indudes an HTML page to be rendered. 

50. The system of claim 36 characterized in that said second layer includes a Java applet related to said object (1 7a, 
17b). 

25 51. The system of claim 36 characterized in that said second layer indudes a cdor histogram. 

52. The system of claim 36 characterized in that said second layer includes data related to the conditions under which 
said image was created including at least one of lighting, camera settings, and time of acquisition. 

30 53. The system of claim 36 characterized in that said second layer includes data related to information for reproduc- 
ing said image including at least one of cropping Information, paper type, camera settings, and image production 
settings. 

54. The system of claim 36 characterized in that said second layer includes another image to be superimposed upon 
35 said image. 

55. The system of claim 36 characterized in that said second layer includes data regarding the author of said image. 

56. The system of claim 36 characterized in that said second layer includes copyright data regarding the copyright of 
40 said image. 

57. The system of daim 56 said copyright data is encoded. 

58- The system of daim 36 characterized in that said second layer includes information regarding how said image 
45 should be viewed. 

59. The system of claim 36 characterized in that said first layer is transmitted from a first computer to a second com- 
puter together with said image. 

50 60. The system of claim 59 characterized in that portions of said second layer are transmitted from said first computer 
to said second connputer upon request by said first computer 

61. The system of claim 60 characterized in that said request is in response to a user selecting an object within said 
image. 

55 



vJSDOCID: <EP 09e2947A2„L> 



21 



EP0 982 947 A2 




18 

A 



22 

r 



24 



± 



TEXTUAL DESdRIFTION 



OBJECT LINKS 
X3SL LINKS 
JAVAAPPUSTS 



OBJECT MDEX 



VOICE ANNOTATION 
IMAGE FEATURES 

USLUNKS 
JAVAAPPLETS 




17a 17b 



gSDOCID- <EP„0982947A2.I_> 



22 



r 



EP 0 982 947 A2 




OBJECTIKDEX 

TEXTUAL DEsaanrnoN 

VOICE ANNOTATION 
IMAGE FEATURES 
OBJECT LIN KS ^58 
(URLLINKS 
JAVA applet: 



59 



OBJECT INDEX 
TEXTUAL DESCRIFnON 
VOICE ANNOTAnON 
IMAGE FEATURES 
OBJECT LINKS 



^• (URLIJfjKS 

JAVA APPLETS 



i 



V 



I HTO:/AyWW JTUDIQ.CQM/-ACTB1ss1 I inTP-7/WWWJCYZ.CO^^ 



(ACmESS HOMEPAGE 

I - OTHO. MOVIE APPEARANCE 

• OSCAE AWARDS 

"BAOCSTAGE 

•ETC 



•52 



1 



60 



COEwfiPANY HOMEPAGE 
-PRODUCT GUIDE 
-AWARDS 
-MORE POPCORN 
-ETC. 



23 



SDOCID: -sEP 0982947A2_I_> 



EP0 982947A2 




70 



TEIGGER 
MECHAMSM 



COMBINED 
IVIDEO/DESCRIFnVE 
STREAM SIGNAL 



i 



80 



TRANSMirTER. 




.82 



SECEIVER. 



84 



VIDEO 
DISPLAY 



JSDOCID- <EP 0982947A2_L> 



24 



EP 0 982 947 A2 



FIG.4 



100 

\ 



112 



IMAGE 



J 



114 

L 

OBJECT SELECTION 
AND OBJECT-BASED 

INFORMATION INPUT 



170 

_J 



IMAGE 
COMPRESSION 



± 



GENERATION OF 
HIERARCHICAL 
DATA STRUCTURE 



116 



115 



SOUND 
COMPRESSION 

L_ 



DATA 
COMPRESSION 



117 



INTEGRATION INTO COB^iMON FILE 



TRANSMISSION / S 




122 



IMAGE 
DECOMPRESSION 
AND DISPLAY 



124 



AUDIOVISUAL 
REALIZATION 
OF THE OBJECT 
INFORMATION 




126 

OBJECT BASED [ 

IMAGE 
MANIPULATION 
AND EDITING 



JSDOCID: <EP 0982947A2_L> 



25 



EP 0 982 947 A2 




27 

SfSDOCID;<EP „ 0982947A2. t, > 



EP0 982 947^ 



190 



196 

r 



JPEG DATA 



194 



EOI 



192 



HIERARCeiCAL DATA STRUCTURE 



FIGJ 



180 




rsgioaOLml joeia .voice .bomdasy .(color .tastoe E^oiiON.isi wsis. .voice 



28 



EP 0 982 947 A2 




09Q2947A2_I_> 



29 



EP0 982 947 A2 



FIG.10 



G^saaag IFIF(+) fks: 



226 

I JPEG me 



228 
J- 



Am&osQgTool 
1 



220 



232 



222 



,230 



User 



Interactive viewiag of JFIF(+) files: 

236 
(+) 



232 

_f 



238 



JEIF(+) file 



JHF( 
Parser 



MF(+) 
Viewer 



230 



224 



«=■ User 



FIGJ1 



332 



JFIF(+) file 



340 



jfFIF(+) 


5> 


Viewer 





330 



User (Viewiag the miage) 



SDOCID: <EP 09a2947A2J.> 



30 



(19) 




Europaisches Patentamt 
European Patent Office 
Office europeen des brevets 



(11) 



EP 0 BB2 947 A3 



(12) 



EUROPEAN PATENT APPLDCATIONl 





Date of publication A3: 


(51) lntCl7: lH104iS! 7/24 












Date of publication A2: 








ui.uj.^uuu DUiietm 2UUU/U9 








Mppncaiion numuer. idouu.^ 








uaie Or Tiling, ^o.uo. 1999 






(84) 


Designated Contracting States: 


(72) 


Inventors: 




AT BE CH CY DE OK ES Fl FR GB GR IE IT LI LU 


0 


Borden, George 




ftflC NL PT SE 




Vancouver WA 98664 (US) 




Designated Extension States: 


0 


Qian, Richard Junqiang 




AL LT LV MK RO SI 




Camas, WA 98607 (US) 






0 


Sezan, Muhammed Ibrahim 


(30) 


Priority: 24.08.1998 US 97738 P 




Camas, WA 98607 (US) 




29.03.1999 US 280421 










(74) 


Representative: Miiller . Hoffmann & Partner 


(71) 


Applicant: Sharp Kabushiki Kalsha 




Patentanwalte 




Osaka-Shi Osaka (JP) 




Innere Wiener Strasse 17 








81667 Munchen (DE) 



(54) Audio video encoding system with enhanced functionality 



(57) A system includes additional infomnation (18) 
together with a video stream, where the additional infor- 
mation (1 8) is related to at least one of the frames (1 6). 
Preferably the additional information (18) is related to 
an object (17a, 17b) within the frame (16). A receiver 
(82) receives the video and additional information (18) 
and decodes the video in the same manner independ- 
ently of whether the additional information (18) is pro- 
vided. The additional infomriation (1 8) is selectively pre- 
sented to a viewer (238) at approximately the time of 
receiving the frames (1 6). The system may also present 
infonrtation to a viewer (238) from a unitary file 
(232,332) containing an image and additional infomna- 
tion (18) associated with the image. A selection mech- 
anism pemnits the selection of objects (1 7a, 17b) in the 
image for which the additional information (1 8) is related 
thereto. A presentation mechanism provides the addi- 
tional infomnation (18) to a viewer (238) in response to 
selecting the object (17a, 17b). 
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